Higher-order Occurrence Pooling on Mid- and Low-level Features: Visual Concept Detection
نویسندگان
چکیده
In object recognition, the Bag-of-Words model assumes: i) extraction of local descriptors from images, ii) embedding these descriptors by a coder to a given visual vocabulary space which results in so-called mid-level features, iii) extracting statistics from mid-level features with a pooling operator that aggregates occurrences of visual words in images into so-called signatures. As the last step aggregates only occurrences of visual words, it is called as First-order Occurrence Pooling. This paper investigates higher-order approaches. We propose to aggregate over co-occurrences of visual words, derive Bag-of-Words with Secondand Higher-order Occurrence Pooling based on linearisation of so-called Minor Polynomial Kernel, and extend this model to work with adequate pooling operators. For biand multi-modal coding, a novel higher-order fusion is derived. We show that the well-known Spatial Pyramid Matching and related methods constitute its special cases. Moreover, we propose Thirdorder Occurrence Pooling directly on local image descriptors and a novel pooling operator that removes undesired correlation from the image signatures. Finally, Uniand Bi-modal First-, Second-, and Third-order Occurrence Pooling are evaluated given various coders and pooling operators. The proposed methods are compared to other approaches (e.g. Fisher Vector Encoding) in the same testbed and attain state-of-the-art results.
منابع مشابه
Recognition of Visual Events using Spatio-Temporal Information of the Video Signal
Recognition of visual events as a video analysis task has become popular in machine learning community. While the traditional approaches for detection of video events have been used for a long time, the recently evolved deep learning based methods have revolutionized this area. They have enabled event recognition systems to achieve detection rates which were not reachable by traditional approac...
متن کاملComparison of mid-level feature coding approaches and pooling strategies in visual concept detection
Bag-of-Words lies at a heart of modern object category recognition systems. After descriptors are extracted from images, they are expressed as vectors representing visual word content, referred to as mid-level features. In this paper, we review a number of techniques for generating mid-level features, including two variants of Soft Assignment, Locality-constrained Linear Coding, and Sparse Codi...
متن کاملA Saliency Detection Model via Fusing Extracted Low-level and High-level Features from an Image
Saliency regions attract more human’s attention than other regions in an image. Low- level and high-level features are utilized in saliency region detection. Low-level features contain primitive information such as color or texture while high-level features usually consider visual systems. Recently, some salient region detection methods have been proposed based on only low-level features or hig...
متن کاملLearning Discriminative Visual N-grams from Mid-level Image Features
Mid-level image features have been shown to be helpful to bridge the semantic gap between low-level and high-level image representations. Many existing methods to learn mid-level visual elements consider each mid-level feature individually, and do not take their mutual relationships into account. We follow the intuitive idea that learning discriminative combinations of visual elements can help ...
متن کاملComparison of Mid - Level Feature Coding Approaches And Pooling Strategies in Visual
A number of techniques for generating mid-level features, including two variants of Soft Assignment, Locality-constrained Linear Coding, and Sparse Coding, are evaluated in the main document [1]. Pooling methods that aggregate mid-level features into vectors representing images like Average pooling, Max-pooling, and a family of likelihood inspired pooling strategies are scrutinised there. This ...
متن کامل